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Abstract 

This text is based on a translation of a chapter 
in a handbook about network analysis (published 
in German) where we tried to make beginners fa- 
miliar with some basic notions and recent devel- 
opments of network analysis applied to bibliomet- 
ric issues (Havemann and Scharnhorst 2010). We 
have added some recent references. 
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1 Introduction 

Bibliometrics is a research field that deals with 
the statistical analysis of bibliographic informa- 
tion. Most bibliometric research can be catego- 
rized as scientometric, because it relates to scien- 
tific publication output, in particular to journal 
articles. Informctrics captures flows of informa- 
tion not just within, but also beyond the world of 
books and periodicals, including communication 
over the Web (Webometrics) and the Internet. 1 

The bibliographic description of a written work 
contains a number of elements such as the name(s) 
of the author(s), title of the piece, keywords and 
data necessary for locating the document (e.g., 
title of the journal or edited volume in which an 
article appears, year of publication, volume/issue 
number, page numbers). This information is often 
collected and archived in databases. All of these 
elements constitute the bibliographic attributes of 
a document, also called the metadata. 

A document's attributes are connected to one 
another through the document itself — author(s) 
to journal, keywords to publication date, etc. 
These connections of different attributes gener- 

1 For an introduction to these overlapping fields, we re- 
fer the reader to the Introduction to Informetrics, a text- 
book by Egghe and Rousseau (1990), to the Handbook 
of Quantitative Science and Technology Research, edited 
by Moed, Glanzel, and Schmoch (2004), to Bibliometrics 
and Citation Analysis, the most recent English language 
monograph by Do Bellis (2009), and to the recent open- 
access electronic book by Havemann (2009), Einfiihrung 
in die Bibliometrie (Introduction to Biblometrics, some 
of the sections in this article are essentially abbreviated 
versions of sections contained in chapter three of Have- 
mann's book.) Finally, as a good starting point to initiate 
the subject, we refer the reader as well to the lecture text 
of Wolfgang Glanzel (2003), Bibliometrics as a Research 
Field. 
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ate bipartite networks which can be represented 
as rectangular matrices. Like attributes, such as 
authors with authors (see section 7 below, on co- 
authorship networks) or keywords with keywords 
(see section 6 below, on co-word analysis), can 
also be coupled to one another; this kind of cou- 
pling produces unipartite networks, represented 
by square, symmetrical matrices. 

Scientific publications are characterized by the 
fact that, as a rule, they contain references to 
other scientific works. This generates a further 
network, namely, that of the publications them- 
selvc. This kind of self-reference has also been 
taken up in models of the scientific publication 
process (Bruckner ct al. 1990; Gilbert 1997; Ley- 
desdorff 2001; Morris ct al. 2003; Lucio- Arias and 
Leydesdorff 2009). 

After Eugene Garfield's pioneering and seminal 
work, the Science Citation Index (SCI), in the 
1960s, the SCI was used not only to navigate in- 
formation, but it was also applied as a tool in 
scientific analysis (Garfield 1955; de Solla Price 
1965; Wouters 1999). The growing accessibility of 
machine-readable data (the SCI was already avail- 
able on magnet tapes relatively early on) and the 
appearance of the second big information network, 
the World Wide Web (WWW), made large-scale 
automated data processing and analysis possible. 
The WWW — in which pages constitute the net- 
work nodes and (hyper-)links the edges — has itself 
become a popular object of research in network 
analysis (Huberman 2001). 

Link analysis also proves fruitful when ap- 
plied to academic institutions or countries, as was 
shown by Thclwall (2004, 2009) and also by Or- 
tega et al. (2008). The position of links in a 
network, allows us to draw conclusions about re- 
search collaboration and the function of differ- 
ent national scientific systems in international re- 
search landscape. 

In addition, the Web remains today a medium 
for accessing data, in part freely, which we can 
analyze according to their own particular net- 
work structures. Consider, for example, the bio- 
information databases, Ebay, or Facebook. Data- 
flow networks of this kind have resulted in the evo- 
lution of a new specialty within statistical physics, 
namely, that of complex networks (Scharnhorst 
2003; Morris and Yen 2004; Pyka and Scharnhorst 
2009). Within the multidisciplinary setting of net- 
work science, the methods of social network analy- 



sis (SNA), originally developed for smaller social 
networks, are combined with statistical analysis 
and dynamic modeling from physics, with com- 
puter science algorithms for data mining and vi- 
sualization, and with graph theory in mathemat- 
ics, for the purpose of better grasping, explaining, 
and mastering existing complex networks in na- 
ture and society (NRC 2005). In recent years, bib- 
liometrics and scientometrics have been strongly 
influenced by these developments in new network 
research (Borner et al. 2007). We will examine 
this more closely in section 8 below. In what fol- 
lows immediately, however, we confine ourselves 
to the network descriptions that have tradition- 
ally received special attention in scientometrics. 

In scientometrics we distinguish roughly be- 
tween the analysis of texts and the analysis of 
actors. Even if information on both elements is 
contained in a single bibliographic record, histor- 
ically speaking, most bibliometric network analy- 
ses have concentrated on text elements (viz., ci- 
tation, co-citation, bibliographic coupling, or se- 
mantic networks). This stands in contrast to 
the analysis of scientific collaboration networks — 
either for cooperation on an individual level or 
cooperation between countries (Wagner 2008). 
Among the more interesting methods that have 
evolved in the intermediate field of text and ac- 
tor is the HistCite method developed by Eugene 
Garfield et al., which we discuss in section 2 be- 
low. 

2 Citation Networks 
of Articles 

Articles in scientific periodicals base on other 
knowledge and acknowledge this by referencing 
earlier articles and other publications. This way 
they build networks. This view of a stream of pe- 
riodicals literature was already propagated more 
than 40 years ago by Derek J. de Solla Price 
(1965). Defining articles as nodes or vertices and 
citations as the links or edges of a network graph 
allows us to apply graph theoretical methods to 
bibliometrics. In view of the social nature of the 
science system, we will also consider how terms 
and concepts initially developed for SNA can be 
fruitfully applied to explain scientific communica- 
tion. 
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New nodes with directed edges pointing to pre- 
viously available nodes are continually added to 
both the network of journal articles and the Web. 
Whereas web pages can be modified (or even com- 
pletely deleted), a journal article remains an un- 
changed document once it has been published. 
Corrections, for example, can only be added to the 
document subsequently as errata or corrigenda. 
Hyperlinks can be added to existing pages retroac- 
tively, referring to pages that have been subse- 
quently produced. In citation networks, by con- 
trast, there is temporal ordering; but the order, 
as it were, is not strict. Because of the protract- 
edness of the publication process, authors may be 
aware of not-yet-publishcd works and cite these, 
but citation network analysis usually does not 
take this into account. 

Today's vast network of scientific journal arti- 
cles began to build up when the referencing of 
older publications became common practice. It 
might be helpful to try to visualize the spacial ex- 
pansion of the whole article citation network as 
a continually growing sphere which adds a new 
"growth ring" every year, in which articles are lo- 
cated through the sources that they cite. The 
HistCite method developed by Eugene Garfield 
et al. (2003) assumes that, at the root of ev- 




Figure 1: Temporally ordered graph of a cita- 
tion network for the first twelve articles on N-rays. 
Data source: de Solla Price (1965) 



ery citation network, there is a single piece of 
path-breaking research — namely, the major work 
of one scientist or the core works emanating from 
a specialist group or specialty area. Successful 
strands of research can thus be extracted or dis- 
tilled from this work. In easily accessible networks 
of frequently cited articles, the paths of scientific 
insight and knowledge are clearly visible. This 
method can also be used to show connections be- 
tween different scientific schools and communities 
or their relative isolation from each other (Lucio- 
Arias and Scharnhorst 2012). 

As Lucio- Arias and Leydesdorff (2008) show, a 
main path analysis of these temporally ordered 
graphs reveals the mechanisms of dissemination 
and diversification (diffusion), as well as those of 
consolidation and standardization (codification) 
of scientific knowledge. This kind of citation- 
based historiography complements biographic and 
scientific-historic investigation and, in so doing, 
bridges the graph-theoretical concept of networks 
and the role of social networks in structuralist so- 
cial theory (Merton 1957). 

If we determine the actual location of a publica- 
tion only on the basis of information given in the 
sources cited, then essentially we forgo other in- 
formation in the document, which may be equally 
crucial for this determination. Therefore, a recon- 
struction of knowledge transfer should not but- 
tress itself solely on the analysis of citation net- 
works. Citation analysis does, however, have the 
advantage of being able to use the mathemati- 
cal calculations of SNA and thereby achieves re- 
sults that hermeneutic historiography alone can- 
not. Moreover the standard methods of informa- 
tion retrieval use the textual similarities between 
documents to show users of specifically retrieved 
texts further texts that might be relevant for them 
(see section 6, S. 10). More recent approaches 
to information retrieval embrace also other bib- 
liometric regularities (Mutschke et al. 2011), co- 
author patterns, and journal distributions. 

Relations (edges, links) in a network of nodes 
can be captured mathematically in a square, ad- 
jacency matrix A whose elements a^- > are dif- 
ferent from zero, if node i is related to node j. If 
we do not differentiate between relations of dif- 
ferent strengths, then a^- has a value of 1, if a 
relationship exists. 

In his above-mentioned article, Derek J. 
de Solla Price (1965) presented the adjacency 
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matrix for citations between articles in a self- 
contained bibliography on N-rays, whereby the 
ones were symbolized with dots and the places 
that would have been filled by the zeros were left 
blank (de Solla Price 1965, p. 514: figure 6). 2 He 
ordered the articles in temporal sequence accord- 
ing to their date of publication and omitted cita- 
tions of all sources external to the bibliography. 
The adjacency matrix for the first twelve articles 
is thus 
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Because future articles could not be cited, ones 
occur in rows of this matrix only to the left of (or 
below) the main diagonal. Thus, because of the 
temporal sequencing of the citation network, de 
Solla Price's adjacency matrix takes the form of a 
triangle; along the main diagonal and to the right 
of (or above) it occur only zeros (ay = 0, Vj > i). 

In figure 1, the graph of the first twelve arti- 
cles in the citation network disaggregate into two 
partial graphs. Each of these graphs represents 
independently achieved results which could only, 
at a subsequent point in time — namely, with the 
appearance of the first overview article on N-rays 
(number 75 in the bibliography) — be interpreted 
as belonging together. 4 The adjacency matrix, 
A, of a citation network can be used to model a 
reader's behavior moving from article to article by 
following the cited sources. For example, a reader 
may begin with article 12; the starting time thus 
noted is t = 0. He/she can be described by the 
column vector, r(0), which contains eleven zeros 
and one one as the twelfth component. By multi- 
plying this vector from the left with the transpose 

2 N-rays turned out to be Active; so, for that reason, 
the bibliography qualifies as self-contained. This concrete 
example of a citation graph will accompany us through the 
subsequent sections of this paper. 

3 Temporally ordered networks are acyclic. Within 
acyclic graphs, there is no path along the directed links 
which loops back to return to the starting point. 

4 This expresses itself as co-citation (cf. section 4, p. 6). 



of A, one finds the reader by articles 8, 9, and 10. 
In the next step, by means of the rule r <— A T f, 
he/she wanders from there to articles 7, 8, and 9, 
and so forth: 
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We can refine this model to one of the random 
reader, 5 whereby each component of the vector 
scales to one: R = r/J2 r i = r/r + . Thus, for a 
given time t = 2, for example, the components 
of the reader vector different from zero will be 
R 7 (2) = 1/4, i? 8 (2) = 1/2, Rg(2) = 1/4. By 
scaling to one, we can interpret these fractions as 
a probability, namely, the probability that, at time 
t — 2 we would encounter the reader by article 7, 
8, or 9, should he/she, upon completion of his/her 
reading, randomly select one of the sources cited. 
We have a double chance of finding the reader by 
article 8 because he/she can arrive at 8 via 9 as 
well as 10. This makes it clear why scaling to one 
allows us now to designate the reader as a random 
reader. 

Today, with the aid of citation indexes, readers 
are not only able to search for articles in the refer- 
ence lists of cited sources retrospectively, but they 
can also navigate temporally forward in citation 
networks. We can model this process by using the 
transpose of the transposed adjacency matrix that 
is, A itself, (A T ) T = A, because reflection along 
the main diagonals reverses all of the arrows in 
the graph of a directed network, which is easy to 
show. 

The adjacency matrix, A, gives the direct paths 
between the nodes in the network along the di- 
rected edges. For the model of the reader, we 
used powers of A (or A T ): A 1 , A 2 , A 3 . . .. For ex- 
ample, if we compare A 2 with the graph in figure 
1, we see that the components of A 2 indicate how 
many (indirect) paths of length 2 there are be- 
tween the nodes for instance, there arc two such 



5 This refers to the random surfer model upon which 
Brin and Page (1998) based their PageRank algorithm. 
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paths between node 8 and node 12 (between the 
remaining pairs of nodes there is at most one path 
of length 2). In general, then, a matrix A k con- 
tains the number of ways of length k between the 
vertexes. 

3 Bibliographic Coupling 

In accordance with Kcsslcr (1963), two articles are 
said to be bibliographically coupled if at least one 
cited source appears in the bibliographies or ref- 
erence lists of both articles. If we look for such 
bibliographic couplings in the graph of the first 
twelve articles on N-rays (figure 1, p. 3), we dis- 
cover a number of these. Nodes 2 and 3 are bib- 
liographically coupled over node 1, as are nodes 2 
and 4. Nodes 3 and 4 are coupled over node 1 and 
over node 2. Nodes 6 and 7 are coupled over node 
5. Nodes 9 through 12 are coupled in pairs over 
node 8, and nodes 10 to 12 are coupled in pairs 
additionally over node 9. 

If a reader desires to locate a bibliographically 
coupled article in a citation network, he/she must 
first move one step along an arrow to the next 
vertex, and then, from that node, in the oppo- 
site direction of the next arrow. As known from 
graph theory, this process is described by the ma- 
trix B — AA T . Its element bij, in accordance with 
the rules for matrix multiplication, is the scalar 
product of the row vectors from A. Because A is 
a binary matrix, summation results in the number 
of matching components of both row vectors, that 
is, the number of common sources. 

Thus, element bij indicates how many biblio- 
graphic couplings exist between articles i and j. 
In other words, bij gives the number of paths of 
length 2, via which one moves from i along the ar- 
row and then to j in the opposite direction. The 
symmetry of the coupled pairs corresponds to that 
of the matrix B = B T . The main diagonal con- 
tains the numbers of bibliographic self-couplings 
of an article, namely, the numbers of all of their 
references to other articles in the network. 

Citation databases enable users to move tem- 
porally forward, backward, and laterally (by zig- 
zagging). Thematically similar articles appearing 
in the same volume of a journal are often tempo- 
rally so proximate that the earlier article cannot 
be cited in the later one. But they also reveal their 
likeness through similar reference lists, that is, 
through strong bibliographic coupling. As early 



as the late-1980s, with the old CD-ROM edition of 
the Science Citation Index, the user was directed 
from an article which he/she located to the twenty 
most strongly bibliographically coupled articles 
via the "related records" option. 6 The strength 
of the coupling of two articles, i and j is defined 
here simply by the number of references that the 
articles have in common, as given by the element 
bij of matrix B. 

In our example of the bibliography on N-rays, 
we can only include in our analysis the citation re- 
lations between articles in the N-ray bibliography, 
although clearly other sources have been cited in 
the articles' reference lists, which do not belong 
to the bibliography. In an alternative approach, 
we can analyze a complete body of scientific litera- 
ture of a publication period together with all cited 
sources. Using the SCI we could, for instance, an- 
alyze the publications in one specific year. Rather 
than selecting a thematic excerpt or segment from 
the citation graph, we then consider all of the 
journal articles of that year with all of their cited 
sources, including books, patents, newspaper ar- 
ticles, etc. Accordingly, the resultant matrix is 
not a square adjacency matrix A whose rows and 
columns represent the same vertexes, but rather a 
rectangular matrix, each of whose rows represents 
an article and each of whose columns represents 
a cited source. Only a few of the journal articles 
for that particular year will appear as a source. 
Thus we arrive at citation network consisting of 
just two types of nodes: articles and sources. Note 
that the articles carry the same publication year, 
and the sources can be from any year. Within 
this network, only edges between nodes of differ- 
ent type are permitted. Networks of this type 
are also called bipartite; the rectangular matrix 
is termed an affiliation matrix. 

For the rectangular affiliation matrix A with 
m rows (articles) and n columns (sources), we 
can also calculate the matrix of bibliographic cou- 
plings B — AA T . Matrix B is also square, in this 
case, and contains for each of the m articles one 
row and one column. 

Two articles, both with long reference lists, 
could contain many sources in common, in which 
case they would be said to be strongly bibliograph- 

6 This use of bibliographic coupling for information re- 
trieval is still part of the on-line edition of the Science 
Citation Index, now part of the Web of Knowledge (see 
http : //wokinf o . com/). 

7 From the Latin ad-filiare meaning to adopt as a son. 
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ically coupled. Articles with only a few references, 
therefore, would tend to be more weakly biblio- 
graphically coupled, if coupling strength is mea- 
sured simply according to the number of refer- 
ences articles contain in common. This suggests 
that it might be more practicable to switch to a 
relative measure of bibliographic coupling, which 
we can define most simply by using set theory. 
In references lists, each cited source appears only 
once; thus we can see a reference list as a set of 
cited sources. In set theory language we formulate 
this thus: 

bij = |Rj H Rj|, 

that is, the element bij of matrix B equals the size 
of the intersection of the reference lists in articles 
i and j. The Jaccard index (or Jaccard similar- 
ity coefficient) gives us a relative measure of the 
overlap of two sets: 8 



|R.2 n 
|R,. u R, 



(1) 



The Jaccard index of bibliographic coupling is 
zero if the intersection of the reference lists is 
empty; it reaches a maximum of one if both lists 
are identical (because, in this case, the intersec- 
tion would be equal to the union). 

Another relative measure of similarity between 
sets, which we can use here, is the so-called Salton 
index: 9 



Sij — 



•y/|Rj||Rj 



(2) 



In this case, the average size of the sets is related 
to the geometric mean of the size of both sets. 
Here, too, the index reaches a maximum of one for 
identical sets and a minimum of zero for disjoint 
ones. 

4 Co-citation Networks 

We speak of the co-citation of two articles when 
both are cited in a third article. Thus, co-citation 



The Swiss botanist and plant physiologist, Paul Jac- 
card (1868-1944), defined this index in 1901. 

9 The computer scientist Gerard Salton (1927-1995), 
who lived and worked in the United States, was one 
of the pioneers in the area of information retrieval (cf. 
Wikipedia). This index was introduced in Salton and 
McGill (1983). In section 3.6 of the above-cited biblio- 
metrics textbook, Havemann (2009) shows how Salton and 
McGill define their index alternatively as the cosine of the 
angle between the row vectors of matrix A. 



can be seen as the counterpart of bibliographic 
coupling. Returning to our example, we can also 
find a number of instances of co-citation among 
the first twelve articles on N-rays (figure 1): arti- 
cles 1 and 2 are co-cited twice (in articles 3 and 
4); articles 8 and 9 are co-cited three times (in 
articles 10, 11, and 12, respectively), and article 
10 is co-cited once with article 8 and once with 
article 9 (in article 12). 

In the previous section we explained how the 
bibliographic coupling matrix B can be obtained 
from the scalar product of the row vectors of the 
adjacency matrix A. Now, we have to construct 
the scalar product of the column vectors from 
A, in order to calculate the elements for the co- 
citation matrix C: 



c-ij — ^ a>kiO>kj- 



Since A is a binary matrix, the summation yields 
the number of common components of the respec- 
tive column vectors, that is, the number of cases in 
which articles appear in the same row or reference 
list. Written compactly, we calculate C — A T A. 
Thus our model reader moves within the graph, 
first, in the opposite direction of the arrow and 
then, in a second step, with the arrow. Like ma- 
trix B, matrix C is also symmetric. The main di- 
agonal of C contains the number of cases in which 
an article is co-cited with itself (which is the case 
for every citation). The number ca is therefore 
the number of all citations of article i in other 
articles in the network. 

Most of the elements in the co-citation ma- 
trix C of our example are equal to zero. This 
is so because the content-related connections be- 
tween both citation strands only became apparent 
as co-citation of these papers, initially, with the 
publication of the first overview article on N-rays 
(number 75 in the N-rays bibliography and thus 
not visible in figure 1). Co-citation relations, un- 
like bibliographic couplings, are subject to change. 
Many content-related connections can or may only 
be recognized by later authors at some subse- 
quent point in time; or, conversely, can or may 
be deemed as no longer essential by later authors. 

As with the bibliographic coupling of articles, 
it is equally reasonable for purposes of co-citation 
analysis to consider all of the journal articles of 
a year with all of their cited sources. We will 
now analyze the bipartite network of articles and 
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sources introduced in the previous section not 
according to how the articles are coupled over 
sources, but rather just the opposite: that is, 
how co-citation in articles connects the sources 
to one another. We can also calculate matrix C 
in accordance with the bipartite network, yielding 
C = A T A. 

In the co-citation analysis of two successive 
volumes of a bibliographic database, many co- 
cited sources in the first year's papers will ap- 
pear again as co-cited sources in the second. Cou- 
pling strength will vary, however; many new co- 
cited pairs of sources will join the old ones. The 
reference lists for a given year's papers, practi- 
cally speaking, are analogous to the results of an 
opinion survey designed to ascertain which cited 
sources are currently seen to be related to one an- 
other. 

The principle of co-citation was first applied by 
Irina Marshakova (1973) in Moscow in a study 
on laser physics. Independently from Marshakova 
the principle was also propagated by Henry Small 
(1973), a scientist from the Institute for Scien- 
tific Information (ISI) in Philadelphia, founded by 
Eugene Garfield in 1960. Since the 1970s, the co- 
citation perspective has been at the core of bib- 
liometric analyses of specialties, fields, scientific 
schools, or paradigms in the sense of Kuhn — in 
other words, co-citation has provided the main 
thrust underlying our understanding of the social 
and cognitive theoretical structure of science (De 
Bellis 2009). 

All bibliometric networks can be visualized or 
modeled graphically. (We will come back to this 
in section 8 below.) Historically, co-citation anal- 
ysis data were used for the mapping of science 
and for the first Atlas of Science project (Garfield 
1981). For such purposes, the network of co-cited 
sources is calculated or derived from the reference 
lists of publications from a single year's papers, 
whereby only those sources are considered the ci- 
tation numbers of which exceed a certain thresh- 
old value (e.g., a threshold value of 5). These 
sources were seen by Henry G. Small (1978) as 
concept symbols. In a network thus constructed, 
the next step is to try to determine clusters of 
sources, whose members have a strong relation- 
ship to each other, but relate only weakly to 
sources in other clusters. 10 

10 In network analysis, such clusters of nodes are also 
called communities. 



To determine clusters of similar objects in ac- 
cordance with the above criteria, a set of algo- 
rithms was developed. If we wish to apply these 
algorithms, we first have to decide which measure 
of co-citation strength between two sources best 
applies: the absolute number of co-citations or, 
for instance, one of the two relative measures pre- 
sented above — the Jaccard or Salton indexes. 

Relative measures of co-citation result in a weak 
coupling strength among frequently cited sources, 
which are only rarely co-cited. This seems appro- 
priate because many of the citing authors do not 
see a closer relationship between those cited con- 
cept symbols. At the ISI, Henry Small first started 
working with the Jaccard index and later with 
the Salton index (Small and Sweeney 1985). Irina 
Marshakova (1973) did not use a relative measure. 
Instead she calculated the expected values for co- 
citation numbers, based upon the independence of 
both citation processes, and accepted only those 
numbers that exceeded the expected values signif- 
icantly. 

In order to get from clusters of concept symbols 
to a map, the next step is aggregation. Hereby, we 
take all of the nodes in one cluster and draw them 
together into a point. Then, we take all of the 
links between each set of two clusters and draw 
these together into a single link of a determined 
strength that corresponds to the specific factual 
distance between those clusters. In this way, we 
create a network of clusters that can be visual- 
ized. The technique of multidimensional scaling 
or MDS has been frequently used to visualize com- 
plex networks on a two-dimensional plane. Today, 
networks are often visualized using force directed 
placement or FDP. 

Since the clusters of nodes thus produced, in 
turn, represent the vertexes in a co-citation net- 
work; using an analogous clustering procedure, we 
can now generate clusters of clusters and so on, 
until all of the cited sources of a given year's pa- 
pers are united in a single cluster that represents 
that part of science indexed by the database used. 

Co-citation cannot only be applied on arti- 
cles. We can also inquire, for example, how of- 
ten authors are cited together in order to model 
or map the structure of the expert community. 
Or, we can analyze co-citations of entire jour- 
nals (see section 5). In neither case, how- 
ever, can we expect that aggregations of pa- 
pers represent just one specific subject. Authors, 



7 



for instance, usually deal with several themes — 
sometimes simultaneously — which can also belong 
to different specialized areas of expertise. Another 
problem of author co-citation is the growing ten- 
dency toward more research collaboration, which 
becomes visible in the increasing number of au- 
thors per article in some fields. The thematic flex- 
ibility of authors leads in fact to a real problem: 
If we wish to determine or define authors' areas 
of activity, thematic flexibility turns out to be a 
crucial factor whenever we seek to trace dynamic 
processes in science (on this matter, see section 8 
below) . 

5 Citation Networks 
of Journals 

Research has been able to expand so much over 
the centuries only because new areas of exper- 
tise have continually opened up, and the various 
research tasks have been distributed accordingly 
among the expert communities representing these 
new fields of scientific endeavor. Each community 
has created its own specialized journal. Along- 
side to these specialized journals, journals coexist 
in which research results of general interest are 
published. Currently, the open-access movement 
changes the journal landscape profoundly. 11 Still, 
we can assume that the foundation of a journal 
is a response of a communicative need of a scien- 
tific community in one field, one country or across 
several. 

But even the most highly specialized journals 
contain more than just those articles contributed 
by the experts in their respective fields of research: 
these periodicals also contain articles from other 
research areas that could be of interest for any 
reader of a particular journal at a given time. The 
result of this is that the literature from one sci- 
entific area is not just to be found in the core 
journals of that area, but rather that it is broadly 
scattered according to Bradford's law (Bradford 
1934). 

Nevertheless, despite the addition of literature 
external to a core research field and in accordance 
with the Porphyrian tree of knowledge, articles 
in one journal should cite the articles of journals 
from contiguous fields more often than they would 
those from fields or disciplines further away. The 

11 see the Public Knowledge Project http://pkp.sfu.ca/ 



early study by Gross and Gross (1927) was based 
on this plausible assumption, already. They de- 
termined the number of citations of other jour- 
nals in the general chemistry publication Journal 
of the American Chemical Society — for the pur- 
pose of providing librarians with data relevant 
for library journal selection. However, number- 
of-journal-citations data gathered in this way are 
not just influenced by thematic proximity or dis- 
tance, but also simply by the quantity and quality 
of the articles in the journal referenced. Journals 
with similar topics compete for the articles that 
are most important for further research. For that 
reason, articles submitted to a journal for publica- 
tion must undergo a qualitative appraisal process, 
the so-called peer review. The bigger a journal's 
reputation, the more articles it will be offered, and 
hence the more rigorous its review process is likely 
to be. Thus scientific periodicals differ not only 
according to area of expertise but also according 
to reputation. 

In sum, then, the citation flows in a network 
of scientific journals are influenced by three main 
factors: thematic contiguity, the size of a journal, 
and the reputation of a journal. As a further in- 
fluencing variable can be added the usual number 
of cited sources per article for the particular area 
of specialty in question. 

We can correct for journal size by relating the 
number of citations to the corresponding number 
of articles available to be cited, as Garfield and 
Sher (1963) did when they introduced the journal 
impact factor (JIF). In order to take account of 
the different citation behavior customary in dif- 
ferent fields, Pinski and Narin (1976) suggested 
that instead of relating the number of journal cita- 
tions to the number of citable articles, this number 
should be related to the total number of references 
in all of the cited journal's articles. This is tan- 
tamount to a kind of import-export relationship 
that would also take into account that review ar- 
ticles with long reference lists are on average cited 
more frequently than original articles publishing 
research results. Thus journal citation networks, 
constructed with such a normalization, will just 
mirror the actual thematic relationships of those 
periodicals and their respective reputations. 

Compared to article citation networks, journals 
networks will naturally have substantially fewer 
nodes and are, for that reason, not only more 
transparent, but also lend themselves more easily 



8 



to numeric analysis. The citation numbers nec- 
essary to construct a journals network are pre- 
sented in aggregated form in the Journal Citation 
Reports of the Science Citation Index and the So- 
cial Sciences Citation Index. In the following, we 
present two examples of journal network analyses. 

5.1 Citation Flows 

Between Journals 

First, we will examine different variants of a net- 
work consisting of five information science jour- 
nals: (1) Information Processing and Manage- 
ment, (2) Journal of the American Society for In- 
formation Science and Technology, (3) Journal of 
Documentation, (4) Journal of Information Sci- 
ence, and (5) Scientometrics. 

We begin by constructing a network that 
has been weighted with reciprocal citation num- 
bers (including self-citations by the journals in the 
network). With data from the Social Sciences Edi- 
tion of the Journal Citation Reports, we derive the 
following adjacency matrix for the citation win- 
dow 2006 and the publication window 2002-2006: 



/ 79 65 15 6 24 \ 

42 182 11 15 44 

22 37 8 6 

26 13 30 11 



6 
20 
V ^ 



(3) 



48 7 10 254 J 



Matrix A contains only elements a,j 7^ 0, because 
all five journals cite each other as well as them- 
selves. The main diagonal contains the journal 
self-citations. In the graph of A, edges (links) be- 
tween all the five vertices flow in both directions. 
Loops represent self-citations. 

Let us, now, like Pinski and Narin (1976) switch 
to a network where the number of citations a^- of 
journal j by journal i are divided by the sum a J+ 
of the number of references to j in the 5-journals 
network, yielding 7^ = aij/aj+. 

Pinski and Narin go even further. They argue 
that citation in a prestigious journal should count 
more than citation in a less important one. But 
precisely because they measure prestige itself by 
number of citations (per cited source) , they end up 
with a recursive concept of that notion, like the 
concept of prestige that has been debated since 
the 1940s for social network analysis (Wasserman 



and Faust 1994). In social networking, it is not 
only important how many people one knows; one 
must know the "right" people, namely, those who 
know a lot of others who are the "right" ones. 

How do we conceptualize this recursive notion 
of prestige mathematically? To this end, we will 
examine a model of prestige redistribution for the 
five information science journals under consider- 
ation here. To begin (i = 0), all five journals 
should have the same weight, so we fix this at 1. 
The weights are written as a column vector. Anal- 
ogous to our procedure for the reader model in 
an article citation network, we then multiply the 
column vector from the left with the transpose of 
the adjacency matrix w(l) — 7 T uJ(0). In this way, 
weights are redistributed within the network. The 
new column vector contains exactly the row sums 
of 7 T , i.e. Wj(l) = 7+j = a+j/aj + , the import- 
export ratios of journals. By repeating this proce- 
dure many times over, we can see that the weights 
iteratively approach fixed limit values. In accor- 
dance with this, for t — ¥ 00, the following equation 
holds: 13 

w = 7 T w;. 

This means that the weights determined by the it- 
eration fulfill an equation which can be seen as an 
expression of the recursive definition of prestige, 
viz., that the weight of journal j results from the 
citation relations to all of the other journals (as 
well as to itself), according to how close these rela- 
tions are, factored by the weight of the other jour- 
nals. Pinski and Narin call this influence weight. 
Determination equations of this type are also re- 
ferred to as bootstrap relations. 

Important to note here is that the iteration pro- 
cedure is somewhat incorrectly labeled "redistri- 
bution." The sum of the five weights after the 
first iteration step is 4.76 < n = 5. In other words, 
weight is lost (because the import-export relations 
of the larger journals are more propitious than 
those of the smaller ones). However, this discrep- 
ancy can be corrected through scaling; for t — > 00, 
we obtain the normalized components 0.76, 1.33, 
1.03, 0.64, and 1.25. By scaling to n, it becomes 
patently clear who the winners and losers of re- 
distribution are. 



12 The actual data can be found in the book by Have- 
mann (2009). 



13 That this relationship holds not just in our special case, 
but in every case, is guaranteed by the theory of eigenval- 
ues. Matrices of type 7 T have a principal (maximal) eigen- 
value of 1. and the iteration determines the corresponding 
eigenvector. 
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Following Pinski and Narin, Nancy Geller 
(1978) considered a kind of modified redistribu- 
tion of journal weights. Her starting point was 
the theory of Markov chains, a special class of 
stochastic or random processes in which (just 
as it is for our case here) the situation at time 
t + 1 is completely determined by the situation 
at time t. Instead of using 7, she took a some- 
what differently scaled matrix 7* with the ele- 
ments 7*j = Oij/a,i+, whose transpose belongs to 
the stochastic matrices. Stochastic matrices have 
the property that they leave the sum of vector ele- 
ments unchanged. Such matrices describe true re- 
distribution; they are therefore well-suited for the 
description of random processes in which proba- 
bility is redistributed over possible system states. 
Nancy Gellcr's algorithm is also interesting be- 
cause there are only a few steps that separate it 
from Google's PageRank algorithm as presented 
in the textbook by Havemann (2009, section 3.3). 
This is an example for a method developed for ci- 
tation networks which became useful also for Web 
analysis. 

5.2 Citation Environments 
of Individual Journals 

If we consider citation matrices for large groups of 
journals, we see that many cells in such a matrix 
are empty and that citation tends to be confined 
to smaller more densely networked groups (Ley- 
desdorff 2007). This is not surprising; it mirrors 
the real-world design and relations of scientific 
specialty areas and disciplines. Nevertheless, the 
delineation of areas remains a problem which has 
not been satisfactorily resolved: the borders are 
fluid. 

Leydesdorff (2007) suggested another method 
for determining the position of individual jour- 
nals within the mass of and relative to all of the 
areas of science. The starting point is an ego net- 
work; let us take, for example, the journal Social 
Networks. Social Networks has two citation envi- 
ronments. The first consists of those journals, in 
a specific time frame, that cite articles in Social 
Networks from a unique time period. We could 
call this group or set of journals the awareness 
area, spillover area, or influence area of Social 
Networks — in other words, its citation impact en- 
vironment. The second environment consists of 
those journals that are cited in articles in Social 



Networks — that is, its knowledge base. For each 
group of journals, then, it is possible to carry out 
the following analysis independently. We ascer- 
tain all of the reciprocal citation links for each 
respective group with the aid of the Journal Cita- 
tion Reports. Social Networks, our starting point, 
is a member of both environments. For these cita- 
tion environments we obtain asymmetric matrices 
like in equation 3 (p. 9). 

By applying the process described above, we 
obtain groups of journals that have similar cita- 
tion behaviors; these groups can thus be inter- 
preted or defined as specialty areas. The po- 
sition of the journal whose ego network was at 
the starting point gives us information about as- 
pects of intcrdisclipinarity (for example, by ap- 
plying betweenness centrality) and a possible in- 
terface function (Leydesdorff 2007). If we repeat 
this analysis over a sequence of years, the some- 
times changing function of a journal in an equally 
dynamic journal environment becomes visible. In 
the case of Social Networks, what was revealed was 
that this journal's functioning as a possible bridge 
between traditional areas of social science and new 
methods and approaches from network theory in 
physics could be reduced to a single year, namely, 
2004 (Leydesdorff and Schank 2008). 

6 Lexical Coupling 

and Co- Word Analysis 

For computer-supported information retrieval 
(IR), documents are characterized by the terms 
used in them. A manageable (not too big) but 
nevertheless informative set of such terms, com- 
prised mainly of keywords supplied by authors or 
indexers, or significant words in a document's ti- 
tle, can serve well for IR. In the extreme, all of 
the words in a document can be taken into ac- 
count. What is interesting then is the frequency 
with which these words occur, not counting stop 
words like "the," "and," "of," and others. The 
appearance of terms in a collection or set of docu- 
ments is described by the term-document matrix 
A, whose element a%j tells us how often in docu- 
ment i the term j occurs. 

The term-document matrix, like the matrix in- 
troduced above consisting of documents and cited 
sources (see section 3), describes a bipartite net- 
work, namely, one of terms and documents. In 
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this case, however, by taking into account the fre- 
quency with which terms occur in a document, we 
weight the network. 

The so-called vector-space model of IR not only 
reveals the similarity between documents based on 
the terms used in them (this is lexical coupling), 
it also shows the relationships between the terms 
based on their common usage in the documents 
(this is the basis of co-word analysis) . The former 
corresponds to bibliographic coupling of articles; 
the latter refers to the co-citation of sources (sec- 
tion 4). 

At the beginning of the 1980s Michel Callon 
and his group in France proposed co-word anal- 
yses as an alternative or supplement to prevail- 
ing co-citation methods (Callon, Courtial, Turner, 
and Bauin 1983). So-called "cognitive sciento- 
metrics" is supposed to make the relationships 
between documents clearly evident where, for 
whatever reason, reciprocal citation has occurred 
only sparsely. Anthony van Raan's group in the 
Netherlands developed interactive tools for co- 
word based science maps in the context of eval- 
uations. These maps can make the activities of 
institutions or countries in specific fields of science 
visible (Noyons 2004). Clusters in these networks 
were interpreted as themes or topics, and tempo- 
rally hierarchically branched trees provided some 
insight into the dynamics of scientific areas (Rip 
and Courtial 1984). 

Co-word analyses can be understood as the 
empirical method of the so-called actor-network 
theory, a social theory in the field of science 
and technology studies (Latour 2005; De Bellis 
2009). Despite more recent and promising text- 
based network analyses for identifying innovation- 
relevant scientific research — some researchers even 
speak of literature-based discoveries (Swanson 
1986; Kostoff 2007) — expert knowledge for the in- 
terpretation of text-based agglomerations is still 
necessary. And, despite decades-long efforts by 
many groups, as Howard White put it succinctly 
in a 2007 discussion, we are still not able to an- 
alyze and visualize the development and change 
of scientific paradigms or scientific controversies 
in such a way that they are accessible to non- 
experts. 1 



1 Howard White, 11th International Conference of Sci- 
entometrics and Informetrics, Madrid 2007, workshop on 
mapping, personal notes. 



This deficit may be due in part to the ambiva- 
lence of language. In a study by Leydesdorff and 
Hellsten (2006), the authors pointed to the signif- 
icance of context for words, and they suggested 
returning to the text-document matrices for word 
analyses as well, in order to gain more complete 
information. Probably the answer also lies in the 
clever selection of a base unit for statistical proce- 
dures. An analysis of developing discussion focal 
points in online forums has shown that, already 
in one single post, contributors brought up or re- 
ferred to different topics. Therefore, in this partic- 
ular case, the choice of sentences as the base unit 
for statistical network analyses produced better 
results than did the longer text passages of one 
post (Prabowo, Thelwall, Hellsten, and Scharn- 
horst 2008). 

In text mining, in sources (such as all key- 
words, all titles, abstracts, or full text) extrac- 
tion of terms or phrases is performed according to 
different algorithms; and statistical measures for 
frequency and correlation (including network in- 
dicators) are applied for which the reference unit 
such as a phrase, a sentence, a document is an im- 
portant parameter. The plethora of combinations 
of these elements presents a great challenge to text 
analysis and text mining — a challenge which can 
only be addressed through strong networking of all 
text-based structural searches including semantic 
Web research (van der Eijk, van Mulligen, Kors, 
Mons, and van den Berg 2004). 

One method of information retrieval devel- 
oped for extracting topics from corpora, which 
uses both types of links in bipartite networks of 
documents and terms — namely, co-word analysis 
and lexical coupling — is latent semantic analysis 
(LSA) proposed by Deerwester, Dumais, Furnas, 
Landaucr, and Harshman (1990). This method is 
based on singular value decomposition (SVD) of 
the term-document matrix. By means of SVD, bi- 
partite networks of articles and cited sources can 
also be analyzed thematically (Janssens, Glanzel, 
and De Moor 2008; Mitcsser, Heinz, Havemann, 
and Glaser 2008). 

7 Co-authorship Networks 

Co-authorship is considered an indicator of coop- 
eration. If two or more authors share the respon- 
sibility for a publication presenting particular re- 
search results; then, in the course of the research 
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process that led to those results, these individu- 
als ought to have worked together in some way or 
another at one time. 

It is appropriate and important to agree on 
a concept of cooperation before the structure 
of co-authorship networks is analyzed or inter- 
preted (Sonnenwald 2007). To discuss this in- 
depth, however, would exceed the scope of this 
article, so we refer the reader to the definition of 
research collaboration in the aforementioned bib- 
liometrics textbook (Havcmann 2009, section 3.7) 
where the author relies on work of Grit Laudel 
(1999, S. 32-40), see also Laudel (2002). 

Not every form of collaboration finds its ulti- 
mate expression in a co-authored work. On the 
other hand, a certain tendency to arbitrarily as- 
sign co-authorship to individuals (or force one on 
them) cannot be denied. On the other hand, 
shared responsibility for an article in a renowned 
journal is only rarely possible without some form 
of cooperation. The co-authors, one would ex- 
pect, at least know each other, 15 and this realis- 
tic expectation is what makes the analysis of co- 
authorship networks interesting. 

Co-authorship networks are usually introduced 
as networks of authors. In its simplest form the 
co-authorship network is unweighted. A link be- 
tween two authors exists, if both appear together 
as authors of at least one publication in the bib- 
liography or body of literature under investiga- 
tion. Weighting the link with the number of ar- 
ticles in which both appear together as authors 
suggests itself, but this kind of modeling still uses 
only part of the information about cooperation 
that can actually be extracted from a bibliogra- 
phy. What it fails to capture is whether the rela- 
tionships between authors are purely bilateral or 
whether these researchers work together in larger 
groups. If, for example, three authors cooperate 
on one article, then between them in total three 
links of weight 1 can be found; this is exactly the 
same as if three pairs of them had each published 
one article together (in total, then, three articles) . 

Another aspect which could be taken into ac- 
count is the sequence in which the co-authors are 
listed. Though, in different communities there 
are different rules whom to place first and last, it 
has been proposed to include information on au- 
thor sequence in bibliometric indicators (Galam 

15 The likely exception here being articles with 100 or 
more authors. 



2011). More complete information is used if co- 
authorship is presented as a bipartite network of 
authors and articles. Element of affiliation ma- 
trix A is equal to 1 if author i appears among the 
authors listed for article j; otherwise it is equal to 
zero. 

Up to now, most investigations have confined 
themselves to co-authorship networks in which 
only authors are represented as nodes and pub- 
lications are not. The adjacency matrix B of such 
a network can be calculated from the affiliation 
matrix A, whereby B = AA T . This becomes im- 
mediately apparent if we carry our deliberations 
on networks of bibliographically coupled articles 
(section 3) over to the authors-articles network. 
In a bipartite network, a co-author is someone 
whom we reach, whenever we take a step toward 
a publication (A T ) and then back again to an au- 
thor (A). We derive the co-authorship figures for 
two authors from the scalar product of the (bi- 
nary) row vectors of A. The diagonal element ba 
in matrix B is therefore equal to the number of 
publications to which author i was a contributor. 

So, how are co-authorship networks structured? 
First of all, we frequently find that an overwhelm- 
ing majority of specialty area authors (more than 
80 %) are grouped together in a single component 
of the network, namely, the so-called main com- 
ponent. The remaining authors, conversely, of- 
ten comprise only small groups of researchers con- 
nected to one another (at least indirectly) over 
co-authorship links. All of the distances occur- 
ring between the cooperation partners — whether 
these are functional (subject- related), institu- 
tional, geographical, language-related, cultural, or 
political — cannot prevent the emergence of one 
big interrelated network of cooperating scientists. 

Equally worthy of note, in comparison to size, 
are the very slight distances between authors 
in the main components of co-authorship net- 
works. In a co-authorship network consisting of 
more than one million biomedical science authors 
whose combined output for the period 1995-99 
was more than two million published articles (ver- 
ified in Medline), the statistical physicist, Mark 
E. J. Newman (2001a) found that over 90 % of the 
authors were grouped together in the main com- 
ponent. 16 The second largest component of this 

16 Because authors in bibliographic databases are not al- 
ways clearly identifiable, the figures vary according to the 
method of identification used. Newman found that, if he 
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network contained only 49 authors. The maximal 
distance between two authors in the main compo- 
nent (also called the network diameter) is a matter 
of only 24 steps or hops: that is, on the shortest 
path between two arbitrary nodes, lie a maximum 
of 23 other nodes. The average length of all of the 
shortest paths between nodes in the main compo- 
nent is less than five hops (Newman 2001b): 

This "small world" effect, first described 
by Milgram, 17 is, like the existence of a giant 
component, 18 probably a good sign for sci- 
ence; it shows that scientific information — 
discoveries, experimental results, theories — 
will not have far to travel through the net- 
work of scientific acquaintance to reach the 
ears of those who can benefit by them. 
(Newman 2001b, p. 3) 

Newman's statement restricts itself to the in- 
formal communication of results between re- 
searchers, which so often precedes formal publi- 
cation. This observation confirms one of the two 
behavioral principles of science Manfred Bonitz 
(1991) formulated early in the history of sciento- 
metrics. They read as: 

Holographic principle: Scientific infor- 
mation "so behaves" that it is eventually 
stored everywhere. Scientists "so behave" 
that they gain access to their information 
from everywhere. 

Maximum speed principle: Scientific in- 
formation "so behaves" that it reaches its 
destination in the shortest possible time. 
Scientists "so behave" that they acquire 
their information in the shortest possible 
time. 

Newman's observation confirms the second of 
Bonitz' principles. 

The famous "small world" experiment referred 
to above, undertaken by social psychologist Stan- 
ley Milgram (1967) for the acquaintance network 

took into account authors' full names, the main component 
would comprise 1.5 million different authors; however, if 
he included the surnames and only initials for first name, 
then this figure shrank to just under 1.1 million individu- 
als. For statistical analysis purposes, this ambiguity is only 
of minor importance; but it strongly impairs the system- 
atic pursuit of individual research. Newer methods depend 
on a combination of names, addresses, channels of publica- 
tion, and citation behavior, in order to automatically and 
unambiguously ascribe researcher identification. 

17 Milgram (1967) 

18 Newman (2001a) 



in the US, resulted in an average distance of six 
hops. 19 Newman explains the small- world ef- 
fect taking himself as an example: he has 26 co- 
authors who, in turn, author publications together 
with a total of 623 other researchers. 

The "radius" of the whole network 
around me is reached when the number 
of neighbors within that radius equals the 
number of scientists in the giant component 
of the network, and if the increase in num- 
bers of neighbors with distance continues 
at the impressive rate [. . . ] it will not take 
many steps to reach this point. (Newman 
2001b, p. 3) 

The number of co-authors that an author has 
is a measure of his/her interconnectedness. It 
is equal to the number of his/her links (degree 
of the node) in the co-authorship network. In a 
given specialty area, marginal or peripheral au- 
thors have only a few cooperation partners. In 
network analysis, therefore, the degree of a node 
is also a measure of its centrality. Often the distri- 
bution of co-authorship is skewed: a few authors 
have many co-authors, many authors have only a 
few co-authors (Newman 2001a, p. 5). 

Another measure of centrality is the between- 
ness of a given node i. Betweenness is defined as 
the total number of shortest paths between arbi- 
trary pairs of nodes, which run through node i. 
Conceivable then, is that nodes with higher val- 
ues of betweenness are responsible for shorter dis- 
tances in networks and also for the emergence of 
large main components. And, in terms of be- 
tweenness centrality, a number of frontrunners 
also set themselves clearly apart from the remain- 
ing authors (Newman 200fb, p. 2). 

Finally, the different roles and functions of re- 
searchers in co-authorship networks is a topic that 
has begun to receive increasing attention. Lam- 
biotte and Panzarasa (2009) have recently con- 
sidered the importance of a researcher's function 
(viz., having a high level of connectivity and au- 
thority within a community versus being an facil- 
itator or communicator between communities) for 
the dissemination of new ideas. 

19 Milgram's subjects had the task of sending a letter, 
via their acquaintances, as close as possible to a recipient 
unknown to themselves. The letters that actually reached 
the targeted individual had been forwarded on average by 
six persons (including the subject). 
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8 Outlook 

Paul Otlet, the pioneer in knowledge organization 
inherited us a wealth of drawings of the evolv- 
ing universe of knowledge (van den Heuvel and 
Rayward 2011). Among them, he depicted the 
main classes of his decimal classification scheme 20 
as longitudes of a globe of knowledge (van den 
Heuvel and Rayward 2005; van den Heuvel 2008). 
Katy Borner has inspired an artistic visualization 
of the sciences as a growing "organism" (Borner 
and Scharnhorst 2009). Recent efforts for a ge- 
ography or geology of the expanding globe of sci- 
entific literature, have led to a revival of "science 
maps." The Atlas of Science, presented by the ISI 
in the early 1980s (Garfield 1981) has been fol- 
lowed by a New Atlas of Science (Borner 2010). 21 
Nevertheless scientists are still struggling to find 
adequate imagery or, as the case may be, an ap- 
propriate mathematical model to represent the 
deeper understanding we have of the dynamic pro- 
cesses of evolving science networks. 

Against this backdrop of the "new network sci- 
ence" , the future of bibliometric network analyses 
lies in the incorporation of methods and tools from 
other disciplinary areas. Visualizations represent 
a possible platform for interdisciplinary encoun- 
ters (Borner 2010). These new "maps of science" 
have met with similar controversy to that which 
we know from the history of geographical maps. 
It therefore comes as no surprise that mappers 
of science have sought to build bridges to car- 
tography. If we apply the methods of structure 
identification (self-organized maps) as they were 
developed for complexity research (Agarwal and 
Skupin 2008), then it is just a small step from the 
question of possible models to the explication of 
complex structure formation. 



"Universal Decimal Classification UDC, see also http: 
/ /udcc . org/ 

21 The Atlas of Science of Katy Borner captures parts of a 
remarkable long-term project, called "Places and Spaces", 
which is a growing exhibition of science maps. This exhibi- 
tion is particularly interesting, for one, because the public 
invitation and selection process enables highly varied and 
unique depictions to achieve greater visibility through pub- 
lic display and, second, because the themes of the yearly 
iterations embrace such a broad and diverse spectrum of 
subject matter — running, for example, from the history of 
cartography, over maps as deceptive or phantasy pictures, 
up to maps drawn by children. Many of the maps on dis- 
play are based on network data. For more information, see 
http : / /www . scimaps . org. 



For bibliometric networks, next to the tradi- 
tional topic of structure identity, the topic of 
structure formation — and with it, the dimension 
of time — steps more forcefully into the foreground 
of interest. All of the methods we have used up to 
now in the global cartography of science or knowl- 
edge landscapes (Scharnhorst 2001) confirm the 
existence of collectively generated self-organizing 
structures in the form of scientific disciplines and 
scientific communities. On the macro-level, these 
patterns are so persistent that, as Klavans and 
Boyack (2009) have recently shown, they mani- 
fest themselves recurrently, relatively independent 
of the respective research methods. 22 

It is more difficult to find general patterns or 
regularities on the micro-level for the interactions 
between authors or for those between authors 
and documented knowledge. What remains is a 
deeper understanding of the dynamic mechanisms 
that describe the emergence of a science land- 
scape and individual navigation within it. We 
expect dynamic mathematical models to repro- 
duce already known statistical bibliometric laws 
like Lotka's law of scientific productivity (Lotka 
1926) or Bradford's law of scattering (Bradford 
1934), mentioned above. Complexity research 
with notions like energy, entropy, or fitness land- 
scapes (Scharnhorst 2001) offers a rich repertoire 
of contemporary analytical methods and models 
which can do justice to the network character of 
complex systems (Fronczak, Fronczak, and Holyst 
2007). Recent empirical research in this area is 
devoted to contemporary effects in evolving bib- 
liometric networks (Borner, Maru, and Goldstone 
2004) and the search for burst phenomena (Chen, 
Chen, Horowitz, Hou, Liu, and Pcllcgrino 2009) or 
the application of epidemic models to the dissem- 
ination of ideas (Bcttencourt, Kaiser, and Kaur 
2009). But also on the level of conceptual models 
philosophy and sociology of science have embraced 
the idea of an epistemic landscape and mathemat- 
ical models for the search behavior of researcher 
in it (Payette 2012; Edmonds, Gilbert, Ahrwciler, 
and Scharnhorst 2011). 

Topic delineation on both the micro as the 
macro level is a pertinent problem of science 
studies in general and bibliometrics in particu- 
lar. However, the problem of topic delineation 



The ring structure of the consensus map bears as- 
tounding similarity to Otlet's historical visions of a globe 
of knowledge. 
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in citation-based networks of papers might be 
a consequence of the overlap of thematic struc- 
tures. The overlap of themes in publications is 
well known to science studies. A recent study by 
Havemann, Glaser, Heinz, and Struck (2012) on 
the meso level tests three local approaches to the 
identification of overlapping communities devel- 
oped in the last years (Fortunato 2010). 

The heuristic value of dynamic models lies in 
their broad range of available possible mecha- 
nisms, forms of interaction, and feedback, which 
can be connected to distinctive types of structure 
formation. Thus, for example, the topically rele- 
vant notion of field mobility 23 can be drawn upon 
for the explication of growth processes in compet- 
ing scientific areas, for which, in addition, schools 
of knowledge as sources of self-accelerating growth 
are also highly relevant (Bruckner and Scharn- 
horst 1986). Through mobility, a network is cre- 
ated between scientific areas, and, at the same 
time, all of the elementary dynamic model mech- 
anisms and a quasi "metabolic network" of knowl- 
edge systems are formed (Ebcling and Scharn- 
horst 2009). The wanderings of the researcher 
can be followed or read from his/her self-citations. 
Whereas self-referencing is usually ignored in sci- 
entometrics, these citations constitute an excel- 
lent source for the investigation and analysis of 
mobility in scientific fields. Structures in the 
self-citation network can be interpreted as the- 
matic areas which become visible through differ- 
ent groups of keywords and co-authorships. The 
wandering of an individual between research ar- 
eas, as shown by his /her lifework of collected sci- 
entific output, can be displayed or represented as 
a unique bar code pattern (Hellsten, Lambiotte, 
Scharnhorst, and Ausloos 2007). This creative 
fuzziness of the scientist can also be shown on sci- 
ence maps as a spreading phenomenon, whereby 
the scientist, instead of the wandering dot, is de- 
picted as a flow field (Skupin 2009). 

The use of models as generators of hypothe- 
ses presupposes that the hypotheses have been 
empirically tested and, accordingly, put into the 
context of social communication, socioeconomic, 
and political theories of "science qua social sys- 

23 The term "field mobility" was introduced in bibliomct- 
rics by Jan Vlachy (1978) to describe the thematic wander- 
ing of scientists. Thematic wandering results from new dis- 
coveries as well as other grounds such as the connectivity 
between scientific areas (Bruckner, Ebcling, and Scharn- 
horst 1990). 



tern" (Glaser 2006). This connection to tradi- 
tionally strongly sociologically anchored SNA — 
especially the proposed inclusion of social, cogni- 
tive, and personality attributes of actors for mod- 
eling the evolution of networks and dissemina- 
tion phenomena within networks — and the inclu- 
sion/application of dynamic modeling approaches 
in SNA provide an excellent basic framework for 
using models to generate theory (Snijdcrs, van dc 
Bunt, and Steglich 2010). 

Semantic web applications creating new linked 
repositories of data, publication and concepts; 
large scale data mining techniques (as originat- 
ing from Artificial Intelligence), meaningful visu- 
alizations, and mathematical models of science all 
contribute to a better understanding of the science 
system. However, similar to the variety of possi- 
ble network visualizations is the variety of math- 
ematical and conceptual models of science. Of- 
ten knowledge about data mining, modeling and 
visualizing is inherited by different relative iso- 
lated academic tribes (Bccher and Trowler 2001). 
Translating and linking concepts and methods 
where appropriate and possible is one remaining 
task. Exploring and better understanding our 
own science history is another one. Only if we suc- 
ceed in bridging unique individual science biogra- 
phies with the laws and regularities of the science 
system as a whole, will we be able to learn more 
about the development of new ideas — knowledge 
generation — and be better able to intervene in this 
process in a more controlled and supportive man- 
ner. 
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